[Bugfix] Fix crash when tool_choice=required exceeds max_tokens#36841
[Bugfix] Fix crash when tool_choice=required exceeds max_tokens#36841vllm-bot merged 5 commits intovllm-project:mainfrom
Conversation
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request addresses a crash that occurs when tool_choice="required" is used with a max_tokens value too small for the tool call generation. The fix correctly handles potential JSON parsing errors from truncated model outputs by using a try...except block (via contextlib.suppress) and removing a problematic assertion. The changes in vllm/entrypoints/openai/engine/serving.py and vllm/entrypoints/openai/chat_completion/serving.py are sound. However, the new test case intended to verify this fix appears to be flawed, as it uses conflicting parameters that prevent it from testing the intended scenario. I've provided a suggestion to correct the test.
|
Hi @chaunceyjiang, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
Hi @chaunceyjiang, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
alvinttang
left a comment
There was a problem hiding this comment.
The fix correctly addresses the crash by replacing the hard with a graceful fallback — is the right idiom here. However, the change in uses which silently swallows any JSON parse errors other than a validation failure (e.g., malformed JSON from the model); it might be worth logging a debug/warning in the suppressed block so these failures aren't invisible. The added test is good but simultaneously passes both and — it would be cleaner to test just the path in isolation to make the failure mode unambiguous. Overall the core fix is sound and the behavior now matches OpenAI semantics.
alvinttang
left a comment
There was a problem hiding this comment.
The fix correctly addresses the crash by replacing the hard assert with a graceful fallback — tool_calls or [] is the right idiom here. However, the change in engine/serving.py uses contextlib.suppress(ValidationError) which silently swallows any JSON parse errors; it might be worth logging a debug/warning in the suppressed block so these failures aren't invisible in production. The added test simultaneously passes both max_tokens=5 and max_completion_tokens=150, which makes the failure mode ambiguous — testing just the max_tokens path in isolation would be cleaner. Overall the core fix is sound and the behavior now matches OpenAI semantics for this edge case.
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
| ) | ||
| # When `tool_choice="required"` and the tokens of `tools` exceed `max_tokens`, | ||
| # both `tool_calls` and `content` should be empty. | ||
| # This behavior should be consistent with OpenAI. |
There was a problem hiding this comment.
Have you confirmed that OpenAI does this as well?
There was a problem hiding this comment.
chat_completion = client.chat.completions.create(
messages=messages,
model="gpt-5",
tools=tools,
tool_choice="required",
max_completion_tokens=10,
)
print(chat_completion)
ChatCompletion(id='chatcmpl-DIU1M3ic0iPTxtnxit59rWDxUpaEH', choices=[Choice(finish_reason='length', index=0, logprobs=None, message=ChatCompletionMessage(content='', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None))], created=1773297678, model='gpt-5', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=10, prompt_tokens=340, total_tokens=350, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=10, rejected_prediction_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0), input_tokens=0, output_tokens=0, input_tokens_details=None))
There was a problem hiding this comment.
Have you confirmed that OpenAI does this as well?
This is the result from my test with GPT-5.
…-project#36841) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
…-project#36841) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
…-project#36841) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
…-project#36841) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
…-project#36841) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> Signed-off-by: Monishver Chandrasekaran <monishverchandrasekaran@gmail.com>
…-project#36841) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> Signed-off-by: Vinay Damodaran <vrdn@hey.com>
…rmats When tool_choice="required" and the model produces non-JSON tool calls (e.g. XML from Qwen3 with qwen3_coder parser), both non-streaming and streaming paths now fall back to the configured tool_parser instead of silently dropping tool calls or failing. Non-streaming (engine/serving.py): - Replace contextlib.suppress(ValidationError) from vllm-project#36841 with try/except that preserves crash-safety (content or "") while adding fallback to tool_parser.extract_tool_calls() for non-JSON formats. Streaming (chat_completion/serving.py): - Initialize tool_parsers for "required" (not just "auto"). - Use separate if blocks (not if/else) so tool parsing runs in the same iteration when reasoning ends (critical for MTP/speculative decoding where </think> and tool call arrive in one chunk). - Dual parser: try tool_parser first (XML), fall back to JSON-only extract_tool_call_required_streaming() for non-deterministic MTP. Signed-off-by: voipmonitor <festr@voipmonitor.org>
…-project#36841) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> Signed-off-by: EricccYang <yangyang4991@gmail.com>
Purpose
FIX #36794
Test Plan
see e2e
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.